The version of R is as listed below.
version
## _
## platform x86_64-w64-mingw32
## arch x86_64
## os mingw32
## crt ucrt
## system x86_64, mingw32
## status
## major 4
## minor 2.2
## year 2022
## month 10
## day 31
## svn rev 83211
## language R
## version.string R version 4.2.2 (2022-10-31 ucrt)
## nickname Innocent and Trusting
The version of Python is listed below.
import sys
sys.version
## '3.8.15 | packaged by conda-forge | (default, Nov 22 2022, 08:42:03) [MSC v.1929 64 bit (AMD64)]'
The purpose of this Assignment is to recreate the base R
code of the textbook in higher-level R and
Python. There will be three different blocks of code that
do the same thing: one in base R, one in fancy
R, and one in Python. The original base
R code is taken from https://hastie.su.domains/ISLR2/Labs/Rmarkdown_Notebooks/Ch2-statlearn-lab.html.
All other code is my own.
First we install the packages needed for this notebook (note that I already have done it so it is commented out). Then we load the libraries.
# install.packages("dplyr", "plotly", "htmlwidgets", "GGally")
library(dplyr) # Used for fancier R manipulation
library(plotly) # Used for fancier plots
library(htmlwidgets) # Used to save plotly plots
library(GGally) # Used for pairs plots
Next for Python we install the packages with
pip. Again I already have done that in the terminal. You
can run it right here with the % in front of it (But this
is a little hacky).
# %pip install plotly
# %pip install pandas
# %pip install numpy
And we load them.
import numpy as np # For vectors, matrices etc.
import plotly.express as px # For plotting simple graphs
import plotly.graph_objects as go # For plotting more complex graphs
import pandas as pd # For data frames
The first block of code assigns a vector. Note that the older
assignment in R, <-, is just =
in Python:
# In R
x <- c(1, 3, 2, 5)
x
## [1] 1 3 2 5
In Python we use [] to create a list. In
order to create a vector we need to use the linear algebra library
numpy.
# In Python
x = np.array([1,3,2,5])
x
## array([1, 3, 2, 5])
Similarly for the next vector with just = in
R.
# In R
x = c(1, 6, 2)
x
## [1] 1 6 2
# In Python
x = np.array([1, 6, 2])
x
## array([1, 6, 2])
# In R
y = c(1, 4, 3)
# In Python
y = np.array([1, 4, 3])
Next is the length() function which for numpy arrays is
shape.
# In R
length(x)
## [1] 3
length(y)
## [1] 3
And Python.
# In Python
x.shape
## (3,)
y.shape
## (3,)
And the summation.
# In R
x + y
## [1] 2 10 5
# In Python
x + y
## array([ 2, 10, 5])
Next we look at the ls() function. In
Python it can be done with the dir() function.
Python gives a bit more info with some built-in things and
packages.
# In R
ls()
## [1] "x" "y"
# In Python
dir()
## ['__annotations__', '__builtins__', '__doc__', '__loader__', '__name__', '__package__', '__spec__', 'go', 'np', 'pd', 'px', 'r', 'sys', 'x', 'y']
Now we remove some variables.
# In R
rm(x, y)
ls()
## character(0)
# In Python
del x,y
dir()
## ['__annotations__', '__builtins__', '__doc__', '__loader__', '__name__', '__package__', '__spec__', 'go', 'np', 'pd', 'px', 'r', 'sys']
And all objects at once.
# In R
rm(list = ls())
To delete only user-defined variables in Python we only
want ones that don’t start with __.
# In Python
for obj in dir():
if not obj.startswith("__"):
del globals()[obj]
dir()
## ['__annotations__', '__builtins__', '__doc__', '__loader__', '__name__', '__package__', '__spec__', 'obj']
Next we ask for help with the matrix function. (Uncomment if you choose; it opens a pop up.)
# In R
# ?matrix
# In Python
import numpy as np # Import again because we just removed it
# help(np.array)
Now we build some matrices.
# In R
x <- matrix(data = c(1, 2, 3, 4), nrow = 2, ncol = 2)
x
## [,1] [,2]
## [1,] 1 3
## [2,] 2 4
For Python it is a little bit different. We can pass a
single vector but then we have to reshape it to be a matrix.
# In Python
x = np.array([1,2,3,4]).reshape(2,2)
x
## array([[1, 2],
## [3, 4]])
If you notice Python reshaped it by filling rows first.
To get the matrix we want we need to transpose it.
# In Python
x = np.array([1,2,3,4]).reshape(2,2) \
.transpose()
x
## array([[1, 3],
## [2, 4]])
This can be done more easily in R with the
byrow flag.
# In R
x = matrix(c(1, 2, 3, 4), 2, 2, byrow = TRUE)
Next we square root and square the matrices.
# In R
sqrt(x)
## [,1] [,2]
## [1,] 1.000000 1.414214
## [2,] 1.732051 2.000000
x^2
## [,1] [,2]
## [1,] 1 4
## [2,] 9 16
And Python.
# In Python
np.sqrt(x)
## array([[1. , 1.73205081],
## [1.41421356, 2. ]])
np.square(x)
## array([[ 1, 9],
## [ 4, 16]])
Finally we generate some random numbers. In Python we
need to specify \(\mu\) and \(\sigma\) whereas in R it is by
default 0 and 1. Python produces a correlation matrix
instead. Also the numbers are different due to different seeds.
# In R
x <- rnorm(50)
y <- x + rnorm(50, mean = 50, sd = .1)
cor(x, y)
## [1] 0.9949742
# In Python
x = np.random.normal(0, 1, 50)
y = x + np.random.normal(50, .1, 50)
np.corrcoef(x,y)
## array([[1. , 0.99301508],
## [0.99301508, 1. ]])
We can set the seed to make reproducible code.
# In R
set.seed(3)
y <- rnorm(100)
mean(y)
## [1] 0.01103557
# In Python
np.random.seed(3)
y = np.random.normal(0, 1, 100)
np.mean(y)
## -0.10863707440606224
Finally we ask for some variances and standard deviations.
# In R
var(y)
## [1] 0.7328675
sqrt(var(y))
## [1] 0.8560768
sd(y)
## [1] 0.8560768
Repeat in Python.
# In Python
np.var(y)
## 1.132081888283007
np.sqrt(np.var(y))
## 1.0639933685333791
np.std(y)
## 1.0639933685333791
plotly is a java-based interactive plotting library. It
shares many similarities with ggplot in R. In
Python there are a bit more nuances, mostly if you want it
simple or complicated.
x <- rnorm(100)
y <- rnorm(100)
plot(x, y)
Next in fancy R.
# In Fancy R
data.frame(cbind(x,y)) |>
plot_ly(x=x, y=y) |>
add_markers() |>
layout(title="Plot of Y vs X", xaxis=list(title="this is the x-axis"),
yaxis=list(title="this is the y-axis"))
And Python
# In Python
import plotly.express as px # Again we got rid of it too with the delete all
fig = px.scatter(x=np.random.normal(0,1,100), y=np.random.normal(0,1,100),
title="Plot of X vs Y", labels=dict(x="this is the x-axis",
y="this is the y-axis"))
fig.show()
Now to save it.
# In R
library(htmlwidgets) # Back again
p <- data.frame(cbind(x,y)) |>
plot_ly(x=x, y=y) |>
add_markers() |>
layout(title="Plot of X vs Y", xaxis=list(title="this is the x-axis"),
yaxis=list(title="this is the y-axis"))
saveWidget(p, file="scatter.html")
And Python.
# In Python
fig = px.scatter(x=np.random.normal(0,1,100), y=np.random.normal(0,1,100),
title="Plot of X vs Y", labels=dict(x="this is the x-axis",
y="this is the y-axis"))
fig.write_html("scatter2.html")
Next we generate some sequences in R and
Python using the range() function. Don’t
forget Python indexes from 0. We also have to use the
np.linspace() function in Python.
# In R
x <- seq(1, 10)
x <- 1:10
x <- seq(-pi, pi, length = 50)
Python:
# In Python
x = range(1,11)
x = np.linspace(-np.pi, np.pi, 50)
Now that we have a domain, we can define a function and plot it as a contour map as well as a surface.
# In R
y <- x
f <- outer(x, y, function(x, y) cos(y) / (1 + x^2))
contour(x, y, f)
contour(x, y, f, nlevels = 45, add = T)
I chose a different number of levels due to data visualization
standards. The base R way is just too cluttered.
# In R
as.data.frame(cbind(x,y,f)) |>
plot_ly(x=x, y=y) |>
add_contour(z=matrix(f, nrow = length(y), byrow = TRUE),
contours = list(
start=-.8,
end=.8,
size=.1,
showlabels = TRUE,
coloring="lines")) |>
layout(title="Contour plot", xaxis=list(title="x"), yaxis=list(title="y"))
For Python things are similar but we use the
meshgrid function.
# In Python
y = x
xr,yr = np.meshgrid(x,y) # A way of making a continuous domain of R^2
f = np.cos(yr) / (1 + xr**2)
# In Python
import plotly.graph_objects as go # We really shouldn't have deleted everything ...
fig = go.Figure() \
.add_contour(x=x,y=y,z=f, contours=dict(
start=-.8,
end=.8,
size=.1,
showlabels=True,
coloring="lines")) \
.update_layout(title="Contour plot") \
.update_xaxes(title="x") \
.update_yaxes(title="y")
fig.show()
Another surface.
# In R
fa <- (f - t(f)) / 2
contour(x, y, fa, nlevels = 15)
# In R
as.data.frame(cbind(x,y,fa)) |>
plot_ly(x=x, y=y) |>
add_contour(z=matrix(fa, nrow = length(y), byrow = TRUE),
contours = list(
start=-.8,
end=.8,
size=.1,
showlabels = TRUE,
coloring="lines")) |>
layout(title="Contour plot 2", xaxis=list(title="x"), yaxis=list(title="y"))
And Python.
# In Python
fa = (f - f.T) / 2
fig = go.Figure() \
.add_contour(x=x,y=y,z=fa, contours=dict(
start=-.8,
end=.8,
size=.1,
showlabels=True,
coloring="lines")) \
.update_layout(title="Contour plot") \
.update_xaxes(title="x") \
.update_yaxes(title="y")
fig.show()
Let’s fill things in.
# In R
as.data.frame(cbind(x,y,fa)) |>
plot_ly(x=x, y=y) |>
add_contour(z=matrix(fa, nrow = length(y), byrow = TRUE),
contours = list(
start=-.8,
end=.8,
size=.1,
showlabels = TRUE,
labelfont=list(color="black"))) |>
layout(title="Image plot", xaxis=list(title="x"), yaxis=list(title="y"))
# In python
fig = go.Figure() \
.add_contour(x=x,y=y,z=fa, contours=dict(
start=-.8,
end=.8,
size=.1,
showlabels=True,
labelfont=dict(color="black"))) \
.update_layout(title="Image plot") \
.update_xaxes(title="x") \
.update_yaxes(title="y")
fig.show()
Going 3D and adding some perspective.
# In R
as.data.frame(cbind(x,y,fa)) |>
plot_ly(x=x, y=y) |>
add_surface(z=matrix(fa, nrow = length(y), byrow = TRUE)) |>
layout(title="Perspective plot", xaxis=list(title="x"), yaxis=list(title="y"))
# In python
fig = go.Figure() \
.add_surface(x=x,y=y,z=fa) \
.update_layout(title="Perspective plot") \
.update_xaxes(title="x") \
.update_yaxes(title="y")
fig.show()
Now we slice and dice the arrays. Depending on who you ask, remember
what you consider to start the natural numbers. First define
A.
# In R
A <- matrix(1:16, 4, 4)
A
## [,1] [,2] [,3] [,4]
## [1,] 1 5 9 13
## [2,] 2 6 10 14
## [3,] 3 7 11 15
## [4,] 4 8 12 16
Python:
# In Python
A = np.array(range(1,17)).reshape(4,4).transpose()
A
## array([[ 1, 5, 9, 13],
## [ 2, 6, 10, 14],
## [ 3, 7, 11, 15],
## [ 4, 8, 12, 16]])
Slice away.
# In R
A[2, 3]
## [1] 10
# In Python
A[1,2]
## 10
# In R
A[c(1, 3), c(2, 4)]
## [,1] [,2]
## [1,] 5 13
## [2,] 7 15
Not so easy in Python. Took a while to find the
ix_ function.
# In Python
A[(np.ix_([0,2], [1,3]))]
## array([[ 5, 13],
## [ 7, 15]])
# In R
A[1:3, 2:4]
## [,1] [,2] [,3]
## [1,] 5 9 13
## [2,] 6 10 14
## [3,] 7 11 15
# In Python
A[:3, 1:4]
## array([[ 5, 9, 13],
## [ 6, 10, 14],
## [ 7, 11, 15]])
# In R
A[1:2, ]
## [,1] [,2] [,3] [,4]
## [1,] 1 5 9 13
## [2,] 2 6 10 14
# In Python
A[:2,:]
## array([[ 1, 5, 9, 13],
## [ 2, 6, 10, 14]])
# In R
A[, 1:2]
## [,1] [,2]
## [1,] 1 5
## [2,] 2 6
## [3,] 3 7
## [4,] 4 8
# In Python
A[:, :2]
## array([[1, 5],
## [2, 6],
## [3, 7],
## [4, 8]])
# In R
A[1, ]
## [1] 1 5 9 13
# In Python
A[0,:]
## array([ 1, 5, 9, 13])
# In R
A[-c(1, 3), ]
## [,1] [,2] [,3] [,4]
## [1,] 2 6 10 14
## [2,] 4 8 12 16
# In Python
np.delete(A, [0,2], 0)
## array([[ 2, 6, 10, 14],
## [ 4, 8, 12, 16]])
# In R
A[-c(1, 3), -c(1,3,4)]
## [1] 6 8
# In Python
np.delete(np.delete(A, [0,2], 0), [0,2,3], 1).flatten()
## array([6, 8])
# In R
dim(A)
## [1] 4 4
# In Python
A.shape
## (4, 4)
Now let’s get to some data. We read in a data set from the website https://hastie.su.domains/ISLR2/Labs/. It is about cars. Read in and take a look.
# In R
Auto <- read.csv("Auto.csv", header = T, na.strings = "?", stringsAsFactors = T)
View(Auto)
head(Auto)
## mpg cylinders displacement horsepower weight acceleration year origin
## 1 18 8 307 130 3504 12.0 70 1
## 2 15 8 350 165 3693 11.5 70 1
## 3 18 8 318 150 3436 11.0 70 1
## 4 16 8 304 150 3433 12.0 70 1
## 5 17 8 302 140 3449 10.5 70 1
## 6 15 8 429 198 4341 10.0 70 1
## name
## 1 chevrolet chevelle malibu
## 2 buick skylark 320
## 3 plymouth satellite
## 4 amc rebel sst
## 5 ford torino
## 6 ford galaxie 500
dim(Auto)
## [1] 397 9
# In Python
import pandas as pd
Auto = pd.read_csv("Auto.csv")
Auto.head()
## mpg cylinders displacement ... year origin name
## 0 18.0 8 307.0 ... 70 1 chevrolet chevelle malibu
## 1 15.0 8 350.0 ... 70 1 buick skylark 320
## 2 18.0 8 318.0 ... 70 1 plymouth satellite
## 3 16.0 8 304.0 ... 70 1 amc rebel sst
## 4 17.0 8 302.0 ... 70 1 ford torino
##
## [5 rows x 9 columns]
Auto.shape
## (397, 9)
# In R
Auto[1:4, ]
## mpg cylinders displacement horsepower weight acceleration year origin
## 1 18 8 307 130 3504 12.0 70 1
## 2 15 8 350 165 3693 11.5 70 1
## 3 18 8 318 150 3436 11.0 70 1
## 4 16 8 304 150 3433 12.0 70 1
## name
## 1 chevrolet chevelle malibu
## 2 buick skylark 320
## 3 plymouth satellite
## 4 amc rebel sst
Now dplyr comes into the picture.
# Fancy R
Auto |> slice(1:4)
## mpg cylinders displacement horsepower weight acceleration year origin
## 1 18 8 307 130 3504 12.0 70 1
## 2 15 8 350 165 3693 11.5 70 1
## 3 18 8 318 150 3436 11.0 70 1
## 4 16 8 304 150 3433 12.0 70 1
## name
## 1 chevrolet chevelle malibu
## 2 buick skylark 320
## 3 plymouth satellite
## 4 amc rebel sst
# In Python
Auto.iloc[:4,:]
## mpg cylinders displacement ... year origin name
## 0 18.0 8 307.0 ... 70 1 chevrolet chevelle malibu
## 1 15.0 8 350.0 ... 70 1 buick skylark 320
## 2 18.0 8 318.0 ... 70 1 plymouth satellite
## 3 16.0 8 304.0 ... 70 1 amc rebel sst
##
## [4 rows x 9 columns]
# In R
Auto <- na.omit(Auto)
dim(Auto)
## [1] 392 9
# Fancy R
Auto |> na.omit() |>
dim()
## [1] 392 9
# In Python
Auto.dropna().shape
## (397, 9)
# In R
names(Auto)
## [1] "mpg" "cylinders" "displacement" "horsepower" "weight"
## [6] "acceleration" "year" "origin" "name"
# Fancy R
Auto |> names()
## [1] "mpg" "cylinders" "displacement" "horsepower" "weight"
## [6] "acceleration" "year" "origin" "name"
# In Python
Auto.columns
## Index(['mpg', 'cylinders', 'displacement', 'horsepower', 'weight',
## 'acceleration', 'year', 'origin', 'name'],
## dtype='object')
Now let’s do some more plotting.
# In R
plot(Auto$cylinders, Auto$mpg)
# Fancy R
Auto |> plot_ly(x=~cylinders, y=~mpg) |>
add_markers() |>
layout(title="Number of Cylinders vs Miles per Gallon",
xaxis=list(title="Cylinders"),
yaxis=list(title="MPG"))
# In Python
fig = px.scatter(Auto, x="cylinders", y="mpg",
title="Number of Cylinders vs Miles per Gallon",
labels=dict(cylinders="Cylinders",
mpg="MPG"))
fig.show()
You can attach() things if you are so inclined that just
makes the variable names available in R. The plot remains
the same. The document from the textbook website plots many box plots
changing one feature at a time. I have put them all together so you
don’t see so many. Note that plotly does not have the
varwidth option. Instead I added jitter points so you can
see how many observations there are.
# In R
attach(Auto)
## The following object is masked from package:ggplot2:
##
## mpg
cylinders <- as.factor(cylinders)
plot(cylinders, mpg, col = "red", varwidth = T)
# Fancy R
Auto |> mutate(cylinders = as.factor(cylinders)) |>
plot_ly(x=~mpg, y=~cylinders, color="red") |>
add_boxplot(line=list(color="red"),marker=list(color="red"), boxpoints="all", jitter=.3 ) |>
layout(title="Number of Cylinders vs Miles per Gallon",
xaxis=list(title="MPG"),
yaxis=list(title="Cylinders"))
## Warning in RColorBrewer::brewer.pal(N, "Set2"): minimal value for n is 3, returning requested palette with 3 different levels
## Warning in RColorBrewer::brewer.pal(N, "Set2"): minimal value for n is 3, returning requested palette with 3 different levels
# In Python
Auto['cylinders'] = Auto['cylinders'].astype("object")
fig = px.box(Auto, x="mpg", y="cylinders", points="all",
title="Number of Cylinders vs Miles per Gallon",
labels=dict(cylinders="Cylinders", mpg="MPG"))
fig.show()
And now for some histograms. Again they plot many of them; I will just plot one version with all the features in it.
# In R
hist(mpg, col = 2, breaks = 15)
# Fancy R
Auto |> plot_ly(x=mpg) |>
add_histogram(nbins=15, color="red", stroke=list(color="black")) |>
layout(title="Histogram of MPG",
xaxis=list(title="MPG"),
yaxis=list(title="Frequency"))
## Warning in RColorBrewer::brewer.pal(N, "Set2"): minimal value for n is 3, returning requested palette with 3 different levels
## Warning: 'histogram' objects don't have these attributes: 'nbins'
## Valid attributes include:
## '_deprecated', 'alignmentgroup', 'autobinx', 'autobiny', 'bingroup', 'cliponaxis', 'constraintext', 'cumulative', 'customdata', 'customdatasrc', 'error_x', 'error_y', 'histfunc', 'histnorm', 'hoverinfo', 'hoverinfosrc', 'hoverlabel', 'hovertemplate', 'hovertemplatesrc', 'hovertext', 'hovertextsrc', 'ids', 'idssrc', 'insidetextanchor', 'insidetextfont', 'legendgroup', 'legendgrouptitle', 'legendrank', 'marker', 'meta', 'metasrc', 'name', 'nbinsx', 'nbinsy', 'offsetgroup', 'opacity', 'orientation', 'outsidetextfont', 'selected', 'selectedpoints', 'showlegend', 'stream', 'text', 'textangle', 'textfont', 'textposition', 'textsrc', 'texttemplate', 'transforms', 'type', 'uid', 'uirevision', 'unselected', 'visible', 'x', 'xaxis', 'xbins', 'xcalendar', 'xhoverformat', 'xsrc', 'y', 'yaxis', 'ybins', 'ycalendar', 'yhoverformat', 'ysrc', 'key', 'set', 'frame', 'transforms', '_isNestedKey', '_isSimpleKey', '_isGraticule', '_bbox'
# In Python
fig = px.histogram(Auto, x="mpg",
title="Number of Cylinders vs Miles per Gallon",
labels=dict(mpg="MPG"))
fig.show()
Finally the pairs plot. Let’s not plot them all like they do in the lab because it is too much and overwhelming.
# In R
pairs(
~ mpg + displacement + horsepower + weight + acceleration,
data = Auto
)
And now for an upgrade.
# Fancy R
(ggpairs(Auto, columns = c(1,3:6), title="Pairs Plot")) |> ggplotly()
## Warning: Can only have one: highlight
## Warning: Can only have one: highlight
## Warning: Can only have one: highlight
## Warning: Can only have one: highlight
# In Python
fig = px.scatter_matrix(Auto[["mpg","displacement","horsepower","weight", "acceleration"]],
title="Pairs Plot", width=1000, height=1000)
## C:\Users\COLINJ~1\AppData\Local\R-MINI~1\envs\R-RETI~1\lib\site-packages\plotly\express\_core.py:279: FutureWarning:
##
## iteritems is deprecated and will be removed in a future version. Use .items instead.
fig.show()
For the last part we make summaries of the data frames and variables. For the whole frame,
# In R
summary(Auto)
## mpg cylinders displacement horsepower weight
## Min. : 9.00 Min. :3.000 Min. : 68.0 Min. : 46.0 Min. :1613
## 1st Qu.:17.00 1st Qu.:4.000 1st Qu.:105.0 1st Qu.: 75.0 1st Qu.:2225
## Median :22.75 Median :4.000 Median :151.0 Median : 93.5 Median :2804
## Mean :23.45 Mean :5.472 Mean :194.4 Mean :104.5 Mean :2978
## 3rd Qu.:29.00 3rd Qu.:8.000 3rd Qu.:275.8 3rd Qu.:126.0 3rd Qu.:3615
## Max. :46.60 Max. :8.000 Max. :455.0 Max. :230.0 Max. :5140
##
## acceleration year origin name
## Min. : 8.00 Min. :70.00 Min. :1.000 amc matador : 5
## 1st Qu.:13.78 1st Qu.:73.00 1st Qu.:1.000 ford pinto : 5
## Median :15.50 Median :76.00 Median :1.000 toyota corolla : 5
## Mean :15.54 Mean :75.98 Mean :1.577 amc gremlin : 4
## 3rd Qu.:17.02 3rd Qu.:79.00 3rd Qu.:2.000 amc hornet : 4
## Max. :24.80 Max. :82.00 Max. :3.000 chevrolet chevette: 4
## (Other) :365
# Fancy R
Auto |> summary()
## mpg cylinders displacement horsepower weight
## Min. : 9.00 Min. :3.000 Min. : 68.0 Min. : 46.0 Min. :1613
## 1st Qu.:17.00 1st Qu.:4.000 1st Qu.:105.0 1st Qu.: 75.0 1st Qu.:2225
## Median :22.75 Median :4.000 Median :151.0 Median : 93.5 Median :2804
## Mean :23.45 Mean :5.472 Mean :194.4 Mean :104.5 Mean :2978
## 3rd Qu.:29.00 3rd Qu.:8.000 3rd Qu.:275.8 3rd Qu.:126.0 3rd Qu.:3615
## Max. :46.60 Max. :8.000 Max. :455.0 Max. :230.0 Max. :5140
##
## acceleration year origin name
## Min. : 8.00 Min. :70.00 Min. :1.000 amc matador : 5
## 1st Qu.:13.78 1st Qu.:73.00 1st Qu.:1.000 ford pinto : 5
## Median :15.50 Median :76.00 Median :1.000 toyota corolla : 5
## Mean :15.54 Mean :75.98 Mean :1.577 amc gremlin : 4
## 3rd Qu.:17.02 3rd Qu.:79.00 3rd Qu.:2.000 amc hornet : 4
## Max. :24.80 Max. :82.00 Max. :3.000 chevrolet chevette: 4
## (Other) :365
# In Python
Auto.describe()
## mpg displacement ... year origin
## count 397.000000 397.000000 ... 397.000000 397.000000
## mean 23.515869 193.532746 ... 75.994962 1.574307
## std 7.825804 104.379583 ... 3.690005 0.802549
## min 9.000000 68.000000 ... 70.000000 1.000000
## 25% 17.500000 104.000000 ... 73.000000 1.000000
## 50% 23.000000 146.000000 ... 76.000000 1.000000
## 75% 29.000000 262.000000 ... 79.000000 2.000000
## max 46.600000 455.000000 ... 82.000000 3.000000
##
## [8 rows x 6 columns]
And with one variable.
# In R
summary(mpg)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 9.00 17.00 22.75 23.45 29.00 46.60
# Fancy R
Auto |> select(mpg) |>
summary()
## mpg
## Min. : 9.00
## 1st Qu.:17.00
## Median :22.75
## Mean :23.45
## 3rd Qu.:29.00
## Max. :46.60
# In Python
Auto["mpg"].describe()
## count 397.000000
## mean 23.515869
## std 7.825804
## min 9.000000
## 25% 17.500000
## 50% 23.000000
## 75% 29.000000
## max 46.600000
## Name: mpg, dtype: float64